Warp models for registering multi-spectral imagery

ABSTRACT

A method for modeling warp for registration of images includes receiving input warp data and performing a fitting process on the input warp data to produce at least one of reduced noise warp data or reduced noise warp uncertainty. The warp for the at least one of reduced noise warp data or reduced noise warp uncertainty is modeled with components including an offset that varies in time and a non-linear distortion that does not vary with time. The method also includes outputting at the least one of reduced noise warp data or reduced noise warp uncertainty.

STATEMENT REGARDING FEDERALLY SPONSORED RESEARCH OR DEVELOPMENT

This invention was made with government support under a Contract awarded by an agency. The government has certain rights in the invention.

BACKGROUND OF THE INVENTION 1. Field of the Invention

The present disclosure relates to imagery, and more particularly to registration of multiple images such as used in multi-modality and multi-spectral imagery.

2. Description of Related Art

Registration between the different bands of multi-spectral imagery, e.g., acquired by an airborne or space sensor, is particularly challenging as the bands that are spectrally far apart do not correlate very well across the image. For example, bands deep in the IR region have both a reflective and an emissive component whereas the bands in the visual region are purely reflective.

Simple phase correlation to obtain the warp is not very effective in these instances. Similarly, registering images acquired using different modalities such as LiDAR, narrow band spectral, and broad band visual photographic imagery present the same issues.

Conventional methods and systems have generally been considered satisfactory for their intended purpose. However, there is still a need in the art for improved image registration for multi-modal and multi-spectral images. The present disclosure provides a solution for this need.

SUMMARY OF THE INVENTION

A method for modeling warp for registration of images includes receiving input warp data and performing a fitting process on the input warp data to produce at least one of reduced noise warp data or reduced noise warp uncertainty. The warp for the at least one of reduced noise warp data or reduced noise warp uncertainty is modeled with components including an offset that varies in time and a non-linear distortion that does not vary with time. The method also includes outputting at the least one of reduced noise warp data or reduced noise warp uncertainty.

The warp can be modeled with components including a slope that varies in time. Performing the fitting process can include modeling the slope and offset as splines. Performing the fitting process can include modeling the non-linear distortion as a polynomial.

The method can include receiving input warp data uncertainty for the input warp data, wherein performing the fitting process includes using the input warp data uncertainty in producing the reduced noise warp data. Performing the fitting process can include correcting errors in the input warp data uncertainty by performing a robust fitting process on the input warp data.

The method can include predicting reduced noise warp uncertainty and utilizing the reduced noise warp uncertainty to generate a refined warp by incorporating motion data, and registering a warped band using the refined warp. It is also contemplated that the method can include predicting reduced noise warp uncertainty using the input warp data uncertainty and utilizing the reduced noise warp uncertainty to generate a refined warp by incorporating motion data, and registering a warped band using the refined warp.

A sensor model can be used to produce warp data from motion and motion warp uncertainty, wherein the refined warp is produced by combining the reduced noise warp, the reduced noise warp uncertainty, the warp data from motion, and the motion warp uncertainty. Combining the reduced noise warp, the reduced noise warp uncertainty, the warp data from motion, and the motion warp uncertainty can include obtaining the refined warp {circumflex over ({right arrow over (h)})} as {circumflex over ({right arrow over (h)})}(Σ_({circumflex over (d)}) ⁻¹+Σ_(ĝ) ⁻¹)⁻¹(Σ_({circumflex over (d)}) ⁻¹{circumflex over ({right arrow over (d)})}+Σ_(ĝ) ⁻¹{circumflex over ({right arrow over (g)})}) where {circumflex over ({right arrow over (d)})} is the reduced noise warp, Σ_({circumflex over (d)}) is the reduced noise warp uncertainty, {circumflex over ({right arrow over (g)})} denotes the warp data from motion, and Σ_(ĝ) denotes the motion warp uncertainty.

The method can also include:

using the input warp data uncertainty to produce relative input warp data uncertainty;

using the relative input warp data uncertainty in a warp model robust fit to produce fit weights;

using the fit weights and relative input warp data uncertainty to produce corrected relative input warp data uncertainty;

predicting relative reduced noise warp data uncertainty from the corrected relative input warp data uncertainty;

using the reduced noise warp data and the input warp data to estimate input noise variance; and

multiplying the relative reduced noise warp data uncertainty and the input noise variance to produce the reduced noise warp data uncertainty.

A smoothing model for slope and offset spline coefficients can be incorporated in the fitting process. Incorporating a smoothing model in the fitting process can include minimizing curvature of the offset and slope splines. The splines can have non-uniformly placed knots. The splines can be cubic.

The fitting process can include modeling the warp as d _(x)(x,y)=s _(y)(y)x+c _(x)(y)+o _(x)(x) d _(y)(x,y)=s _(y)(y)x+c(y)+o _(y)(x). where d_(x)(⋅,⋅) and d_(y)(⋅,⋅) are x and y components of the warp, respectively, s_(x)(⋅) and c_(x)(⋅) denote slope and offset as a function of time for the x component of the warp, s_(y)(⋅) and c_(y)(⋅) denote the slope and offset as a function of time for the y component of the warp, and o_(x)(⋅) and o_(y)(⋅) denote distortion that is constant in time but varies along a sensor array, and y in the functions denotes time and x in the functions denotes distance along the sensor array.

A method for registration of images includes receiving a reference image and a warped image, correlating the reference and warped images to produce noisy warp data, and performing a fitting process on the noisy warp data to produce reduced noise warp data. The warp for the reduced noise warp data is modeled with components including an offset that varies in time, and a non-linear distortion that does not vary with time. The method also includes resampling the warped image using the reduced noise warp data to produce a registered image that is registered to the reference image.

Correlating the reference and warped images can include producing warp data uncertainty. Performing the fitting process can include generating reduced noise warp uncertainty. The method can include receiving motion data and combining the motion data with the reduced noise warp and the reduced noise warp uncertainty to produce a refined warp, wherein resampling the warped image using the reduced noise warp data includes using the refined warp to produce the registered image.

A system includes a module configured to implement machine readable instructions to perform any embodiment of the methods described above.

These and other features of the systems and methods of the subject disclosure will become more readily apparent to those skilled in the art from the following detailed description of the preferred embodiments taken in conjunction with the drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

So that those skilled in the art to which the subject disclosure appertains will readily understand how to make and use the devices and methods of the subject disclosure without undue experimentation, preferred embodiments thereof will be described in detail herein below with reference to certain figures, wherein:

FIGS. 1A-1D are a set of graphs showing spline basis functions in accordance with an exemplary embodiment of the present disclosure;

FIG. 2 is a graph showing three exemplary functions for assigning weights to data samples in accordance with an exemplary embodiment of the present disclosure;

FIG. 3 is a data flow diagram of an exemplary embodiment of image registration in accordance with the present disclosure;

FIG. 4 is a data flow diagram of an exemplary embodiment of a method of modeling warp for image registration in accordance with the present disclosure;

FIG. 5 is a data flow diagram of uncertainty estimation in accordance with an embodiment of the present disclosure; and

FIG. 6 is a data flow diagram for an exemplary embodiment of estimating warp with motion data.

DETAILED DESCRIPTION OF THE PREFERRED EMBODIMENTS

Reference will now be made to the drawings wherein like reference numerals identify similar structural features or aspects of the subject disclosure. For purposes of explanation and illustration, and not limitation, a partial view of an exemplary embodiment of a method in accordance with the disclosure is shown in FIG. 3 and is designated generally by reference character 10. Other embodiments of methods in accordance with the disclosure, or aspects thereof, are provided in FIGS. 1A-1D, 2 and 4-6, as will be described. The systems and methods described herein can be used for multi-modal/multi-spectral image registration.

1 Separable Spline Warp Model

Let {x_(w), y_(w),} denote the coordinates in a warped band image that correspond to the coordinates {x, y} in a reference band image. Let d _(x)(x,y)=x _(w)(x,y)−x  (1) d _(y)(x,y)=y _(w)(x,y)−y  (2) denote the space varying translation between the warped and the reference band. Assume the images of the reference and warped bands are acquired by two linear sensor arrays displaced from each other spatially. The sensor arrays are mounted on an airborne or space platform and the images are obtained by scanning the arrays in time as the platform moves. The platform motion may not be perfectly uniform in time and this causes a time-dependent warping between the bands. We will use x for the direction along the sensor arrays (fast-scan) and the y for the time dimension (slow-scan.) Given the rigid geometry in the x direction, we expect the warping to be linear in this direction unless there is uncorrected optical distortion. The linear warping in x will be time dependent but the optical distortion will be constant in time. Let s_(x)(⋅) and c_(x)(⋅) denote the slope and offset as a function of time for the linear mapping of the warp in the x component. Similarly, let s_(y)(⋅) and c_(y)(⋅) denote the slope and offset for the y component of the warp in time. Let o_(x)(⋅) and o_(y)(⋅) denote the relative optical distortion between the warped and the reference band along the sensor array for the x and y component respectively. We model the space varying translation between the bands as follows d _(x)(x,y)=s _(x)(y)x+c _(x)(y)+o _(x)(x) d _(y)(x,y)=s _(y)(y)x+c _(y)(y)+o _(y)(x).  (3) Note that this is nothing but a Taylor series expansion of the delta warp d(x, y) along the x direction with the zeroth and first order coefficients as functions of y (time) and the higher order coefficients constant in time.

In the subsequent treatment, we will use v={x, y} as an index for x or y. Since the equations are similar for the translations in x and y, this makes for compact notation. Assuming the platform motion is more or less steady, the slope and offset, s_(*)(⋅) and c_(*)(⋅), will be smooth functions of time, y. We choose to model them as B-splines and they can written as a summation of a set of basis functions as

$\begin{matrix} {{{s_{v}(y)} = {\sum\limits_{j = 1}^{n_{s}}{s_{vj}{B_{j,k}^{s}(y)}}}},} & (4) \\ {{{c_{v}(y)} = {\sum\limits_{j = 1}^{n_{c}}{B_{j,k}^{c}(y)}}},} & (5) \end{matrix}$ where n_(s) and n_(c) are the number of basis functions that are chosen for the slope and offset functions respectively. The number of knots and the spline order k determines the number of basis functions. See Sec. 1.1 for generating the set of basis functions B_(j,k)(⋅), j=1 . . . n given the order and knot locations.

The optical distortion can either be modeled as a B-spline or a polynomial

$\begin{matrix} {{{\sum\limits_{j = 1}^{n_{o}}{o_{vj}{B_{j,k}^{o}(x)}\mspace{14mu}{or}\mspace{14mu}{o_{v}(x)}}} = {\sum\limits_{j = 1}^{n_{o}}{o_{vj}x^{e_{j}}}}},} & (6) \end{matrix}$ where n_(o) denotes the number of basis functions and e_(j), j=1, . . . ,n_(p), denotes the n_(o) exponents chosen for the polynomial basis. In the subsequent treatment, we will use B_(j,k) ^(o)(⋅) either for the spline basis or the polynomial basis. 1.1 Recursive Generation of B-Spline Basis Functions

To generate the B-spline functions, the number and location of the knots need to be specified along with the order of the spline. Let the knots sequence be denoted as t₁≤t₂≤ . . . ≤t_(n), where n is the number of knots. Let k denote the order of the spline where k=1 denotes the 0^(th) order spline. The B-spline basis functions for any order k is generated recursively from the previous order basis functions. The knots sequence is augmented for each order k as follows

$\begin{matrix} {t^{(k)} = {\left\{ {\underset{\underset{k\mspace{14mu}{times}}{︸}}{t_{1},\ldots\mspace{14mu},t_{1}},t_{2},\ldots\mspace{14mu},t_{n - 1},\underset{\underset{k\mspace{14mu}{times}}{︸}}{t_{n},\ldots\mspace{14mu},t_{n}}} \right\}.}} & (7) \end{matrix}$ The recursion of the B-spline at any order k>1 is given as

$\begin{matrix} {{{B_{j,k}(x)} = {{\frac{x - t_{j}^{(k)}}{t_{j + k - 1}^{(k)} - t_{j}^{(k)}}{B_{{j - 1},{k - 1}}(x)}} + {\frac{t_{j + k}^{(k)} - x}{t_{j + k}^{(k)} - t_{j + 1}^{(k)}}{B_{j,{k - 1}}(x)}}}},{j =},\ldots\mspace{14mu},{n + k - 2},{k > 1},} & (8) \end{matrix}$ and is initialized at k=1 as

$\begin{matrix} {{B_{j,1}(x)} = \left\{ {\begin{matrix} 1 & {{t_{j} \leq x < t_{j + 1}},{j = 1},\ldots\mspace{14mu},{n - 2}} \\ 1 & {{t_{j} \leq x \leq t_{j + 1}},{j = {n - 1}}} \\ 0 & {otherwise} \end{matrix},} \right.} & (9) \end{matrix}$ FIGS. 1A-1D show this recursion from k=1 to k=4 to generate the cubic splines (k=4) for a non-uniformly spaced knots sequence [0,0.25,0.4,0.8,1]. 1.2 Warping model parameter estimation

The unknown spline coefficients for slope, offset, and optical distortion need to estimated. This is achieved by fitting the warping model Eq. (3) to the warping data obtained by the image correlation algorithm at a discrete set of tie points. To obtain the warp data at the tie points, methods such as phase correlation on local images of the warped band and the master band centered on the tie points may be empolyed. Such methods are known to those skilled in the art. Let {x_(i), y_(i)}, i=1, . . . , m denote a set of m grid points at which the warp has been estimated. To facilitate the estimation of the unknown coefficient, we rewrite Eq. (3) in vector-matrix notation d=Ap,  (10) where

$\begin{matrix} {A = \begin{bmatrix} {{B_{1,k}^{s}\left( y_{1} \right)}x_{1}} & \ldots & {{B_{n_{s},k}^{s}(y)}x_{1}} & {B_{1,k}^{c}\left( y_{1} \right)} & \ldots & {B_{n_{c},k}^{c}\left( y_{1} \right)} & {B_{1,k}^{o}\left( x_{1} \right)} & \ldots & {B_{n_{o},k}^{o}\left( x_{1} \right)} \\ \vdots & \ddots & \vdots & \vdots & \ddots & \vdots & \vdots & \ddots & \vdots \\ {{B_{1,k}^{s}\left( y_{m} \right)}x_{m}} & \ldots & {{B_{n_{s},k}^{s}\left( y_{m} \right)}x_{m}} & {B_{1,k}^{c}\left( y_{m} \right)} & \ldots & {B_{n_{c},k}^{c}\left( y_{m} \right)} & {B_{1,k}^{o}\left( x_{1} \right)} & \ldots & {B_{n_{o},k}^{o}(x)} \end{bmatrix}} & (11) \end{matrix}$ is a m×n matrix, where n_(p)=n_(s)+n_(c)+n_(o), and

$\begin{matrix} {{p = \begin{bmatrix} s_{x\; 1} & s_{y\; 1} \\ \vdots & \vdots \\ s_{{xn}_{s}} & s_{{yn}_{s}} \\ c_{x\; 1} & c_{y\; 1} \\ \vdots & \vdots \\ c_{{xn}_{c}} & c_{{yn}_{c}} \\ o_{x\; 1} & o_{y\; 1} \\ \vdots & \vdots \\ o_{{xn}_{o}} & o_{{yn}_{o}} \end{bmatrix}},{d = \begin{bmatrix} {d_{x}\left( {x_{1},y_{1}} \right)} & {d_{y}\left( {x_{1},y_{1}} \right)} \\ \vdots & \vdots \\ {d_{x}\left( {x_{m},y_{m}} \right)} & {d_{y}\left( {x_{m},y_{m}} \right)} \end{bmatrix}},} & (12) \end{matrix}$ are n_(p)×2 and m×2 matrices respectively. The matrix p has all the spline coefficients, the matrix d has all the data, and the matrix A is constructed based on the model structure. The spline basis functions have a compact local support and this makes the matrix A to be sparse. Note that bold case letters will be used to represent matrices. Subscripts on matrices will be used to identify rows (first index) and columns (second index) with denoting all indices. For example, A_(i*), denotes the i^(th) row and A_(*j) denotes the 1^(th) column of matrix A. The first and second columns of p and d matrices contain the model coefficients and data for x and y respectively. Hence we will also use p_(*v) or d_(*v) with v taking values {x, y} to denote the column vectors corresponding to x and y. 1.2.1 Least Squares Estimation of Spline Coefficients

Equation (10) compactly represents the warp model evaluated at the tie point grid where warping data is available either through image correlation or some other means. In general, the data from the image correlation process will be noisy either due to measurement process or correlation model mismatch as the image content may not match very well between the warped band and the master band. We would like to adjust the unknown spline coefficient to minimize the mean square error between the model predictions and the data. The least-squares estimate can be obtained in closed form and is given as {circumflex over (p)}=(A ^(t) A)⁻¹ A ^(t) A  (13) 1.2.2 Weighted Least Squares Estimation of Spline Coefficients

The solution of Eq. (13) assumes that the noise is uniformly distributed across all the m data points and that the two components in x and y are independent. However, depending on how the data values are obtained for the warping, the noise may be quite variable between the data points and also coupled between the two components. Let W(i) denote the 2×2 weighting matrix for data point i based on the confidence we have on the estimate for that point. The inverse of the noise covariance matrix can be chosen as the weighting matrix. The weighted mean square error across all m data points in vector-matrix notation is given as

$\begin{matrix} {{C = {\sum\limits_{i = 1}^{m}{\left( {{{A(i)}\overset{\rightarrow}{p}} - d_{i^{*}}^{t}} \right)^{t}{W(i)}\left( {{{A(i)}\overset{\rightarrow}{p}} - d_{i^{*}}^{t}} \right)}}},} & (14) \end{matrix}$ where

$\begin{matrix} {{A(i)} = \begin{bmatrix} A_{i^{*}} & 0 \\ 0 & A_{i^{*}} \end{bmatrix}} & (15) \end{matrix}$ is a 2×2n matrix and

$\begin{matrix} {\overset{\rightarrow}{p} = \begin{bmatrix} p_{*_{x}} \\ p_{*_{y}} \end{bmatrix}} & (16) \end{matrix}$ is a 2_(n) _(p) ×1 vector containing the spline coefficient for both components. The spline coefficients that minimize the mean square error C is obtained by differentiating Eq. (14) with respect to {right arrow over (p)} and setting it equal to zero

$\begin{matrix} {{\left. \frac{\partial C}{\partial\overset{\rightarrow}{p}} \right|_{\overset{\rightarrow}{p} = \hat{\overset{\rightarrow}{p}}} = 0}{{\sum\limits_{i = 1}^{m}\;{2{A^{t}(i)}{W(i)}\left( {{{A(i)}\hat{\overset{\rightarrow}{p}}} - d_{i^{*}}^{t}} \right)}} = 0}{{\sum\limits_{i = 1}^{m}\;{{A^{t}(i)}{W(i)}{A(i)}\hat{\overset{\rightarrow}{p}}}} = {\sum\limits_{i = 1}^{m}\;{{A^{t}(i)}{W(i)}d_{i^{*}}^{t}}}}{\hat{\overset{\rightarrow}{p}} = {\left( {\sum\limits_{i = 1}^{m}\;{{A^{t}(i)}{W(i)}{A(i)}}} \right)^{- 1}\left( {\sum\limits_{i = 1}^{m}\;{{A^{t}(i)}{W(i)}d_{i^{*}}^{t}}} \right)}}} & (17) \end{matrix}$ 1.2.3 Special Case for Uncorrelated Components

If the x and y components are uncorrelated (weighting matrices W(i) are diagonal), then Eq. (17) can be simplified since the spline coefficient for x and y can be estimated independently. Let

$\begin{matrix} {{W_{x} = \begin{bmatrix} W_{11}^{(1)} & \; & 0 \\ \; & \ddots & \; \\ 0 & \; & W_{11}^{(m)} \end{bmatrix}},{W_{y} = \begin{bmatrix} W_{22}^{(1)} & \; & 0 \\ \; & \ddots & \; \\ 0 & \; & {W_{22}^{(m)},} \end{bmatrix}}} & (18) \end{matrix}$ be the m×m diagonal weighting matrices for x and y components respectively. Then the weighted least-squares solution for the two components is given as {circumflex over (p)} _(*v)=(A ^(t) W _(v) A)⁻¹ A ^(t) W _(v) d _(*v) ,v={x,y}  (19) Note that until now we have assumed the basis functions are the same for x and y components. This restriction can be easily relaxed by using a different matrix A for the x and y components. For the coupled solution given by Eq. (17), the matrix A(i) can be modified as

$\begin{matrix} {{{A(i)} = \begin{bmatrix} A_{x_{i^{*}}} & 0 \\ 0 & A_{y_{i^{*}}} \end{bmatrix}},} & (20) \end{matrix}$ where A_(x) and A_(y) denotes the A matrix formed from the basis functions for x and y respectively. 1.2.4 Numerical Computation Considerations

The warping model matrix A can end up being badly scaled and ill-conditioned leading to a host of numerical issues in the solution of the model spline coefficients. The problem arises because the range of x and y may be large and quite different from each other. The large range of x makes the basis functions for the optical distortion model given by Eq. (6) to have a very large dynamic range if the polynomial model is chosen. With finite precision, the matrices will become rank deficient making it difficult to fit the data correctly. To overcome this problem, it is advisable to first normalize the range of the x and y between 0 and 1. The warping model is then fitted in the normalized domain and the results scaled back to the original domain for subsequent processing.

Also note the solutions in Eqs. (13), (17), and (19), require a matrix inverse. Although they have been written in that form, it is not advisable to compute the explicit inverse both from a numerical accuracy and performance considerations. Instead, it is better to take the inverted matrix to the other side of the equation and solve the resulting set of linear equations using Gaussian elimination.

1.2.5 Robust Weighted Least Squares

The previous Sections assumes that the noise in the data can be modeled as Gaussian. However, in practice if the data are generated using an image correlation procedure, the noise is very much image content dependent. Furthermore, if the image registration is being done between two different imaging modalities or very different spectral bands, the image correlation model may break down. Both of these factors can lead to large impulsive kind of noise that is not very well modeled by a unimodal normal distribution. The quadratic penalty resulting from the Gaussian assumption tends to be overly sensitive to outliers and the resulting least-squares solution can end up significantly biased. Towards this end, we would like to replace the quadratic penalty with alternative penalty functions that reduce the cost the model associates with large outlying values in the data. The optimal shape of the penalty function is given by actual probability distribution function of the noise in the data. In practice, it is difficult to come up with an accurate description of the noise. However, any penalty function that reduces the cost associated with large outlying values will tend to perform better in the case the data is corrupted by impulsive or heavier tail distributions.

One advantage of the weighted least squares method is that the solution comes out to be closed form. This will not be case if an alternative robust penalty function is employed. However, we can leverage the closed form solution for weighted least squares to come up with an iterative solution that assigns varying weights to each data point based on a determination of how well it fits the Gaussian distribution assumption. The weights vary during the iterative process as the model moves to fit the normally distributed data points and reject the outlying data values that do not fit the model very well.

1.2.6 Solution for uncorrelated components

This Section lays out the robust solution when the x and y components are uncorrelated. In this case, the model fitting in x and y can be carried out independently. For notational simplicity, we will drop the subscripts x and y on the matrices in this Section and give the robust solution for the general problem A{right arrow over (p)}={right arrow over (d)}  (21) where A is the model matrix, {right arrow over (d)} is the data vector and {right arrow over (p)} is the vector of unknown model coefficients. Note that the general formulation given by Eq. (21) includes the case where we have a weighting matrix W. The effect of the weighting is incorporated in the model matrix and the data vector as follows: A→W ^(1/2) A,{circumflex over (d)}→W ^(1/2) {right arrow over (d)}.  (22)

The robust fitting method proceeds by assigning weights to each of the data points at each iteration based on its determination of how likely the point is an outlier or not. Let l denote the iteration index and let w_(i) ^((l)) denote the weight associated with the i^(th) data point at iteration l. The weights are all initialized to one at the first iteration, w_(*) ⁽¹⁾=1. The weights at each iteration l are put into a diagonal weighting matrix denoted as W^((l)). The solution at iteration l is given as {circumflex over ({right arrow over (p)})}^((l))=(A ^(t) W ^((l)) A)⁻¹ A ^(t) W ^((l)) {right arrow over (d)}  (23) The model fitting error is given by the residual {right arrow over (r)} ^((l)) =A{circumflex over ({right arrow over (p)})}^((l)) −{right arrow over (d)}  (24) It can be shown that the covariance of the residuals is given as E[{right arrow over (r)}{right arrow over (r)} ^(t)]=(I−P)σ²  (25) where σ² is the variance of the noise in the data and P is the projection matrix P=A(A ^(t) A)⁻¹ A ^(t)  (26)

Given the result of Eq. (25), we can normalize the residuals at each iteration to obtain a unit variance random variable

$\begin{matrix} {{{\overset{\_}{r}}_{i}^{(l)} = \frac{r_{i}^{(l)}}{\sqrt{1 - P_{ii}}{\hat{\sigma}}^{(l)}}},} & (27) \end{matrix}$ where {circumflex over (σ)}^((l)) is an estimate of the noise standard deviation at iteration l. The quantity (1−P_(ii)) is known as the leverage for data sample i. We need to obtain a robust estimate of σ in the presence of outliers. The ML estimate given by the root mean square error is too susceptible to large outlying values. Instead, we choose the median of the absolute deviation as our robust estimate

$\begin{matrix} {{\hat{\sigma}}^{(l)} = {1.4826\mspace{14mu}{{{median}\left( \left| \frac{r_{i}^{(l)}}{\sqrt{1 - P_{ii}}} \right| \right)}.}}} & (28) \end{matrix}$ The factor 1.4826 is chosen such that if the samples are drawn from a Gaussian distribution with no outliers, the robust estimate is equal to the ML estimate.

The normalized residuals are then passed through a weighting function ρ(⋅) that assigns weights to each data sample. For robustness, we would like to dial down the weights as the normalized residual gets large signaling an outlier. There are numerous functions in the literature that achieve this purpose. FIG. 2 shows three of the most commonly used functions for this purpose. Large residual error signals a model misfit and is given lower weight in the fitting process to achieve robustness to outliers. We will use the bisquare function given as

$\begin{matrix} {{\psi(r)} = \left\{ {\begin{matrix} \left( {1 - r^{2}} \right)^{2} & \left| r \middle| {< 1} \right. \\ 0 & {otherwise} \end{matrix}.} \right.} & (29) \end{matrix}$ The weights at each iteration l are obtained as w _(i) ^((l))=ψ( r _(i) ^((l))/4.685)  (30) where scaling factor 4.685 has been chosen to obtain 95% efficiency with respect to the ordinary least square estimate when no outliers are present. This choice basically sets the weights of sample data that is more the 4.685 standard deviations away to zero. Note that the choice of the scaling factor depends on the form of the weighting function and will be different for the Huber and the Cauchy functions shown in FIG. 2.

The procedure outlined in this Section is repeated until the estimate of the unknown model coefficients given by Eq. (23) converges or the maximum number of iterations are exceeded.

1.3 Smoothing model

The smoothness of the slope and offset profiles in time is implicitly determined by the choice of the number of spline coefficients and their placement. A small number of evenly spaced knots constrains the profiles to be smooth. Increasing the number of knots frees up the profiles to change more quickly. Ideally, we would like the splines to be flexible enough to fit all the good data. However, choosing a large number of knots can make the results vary wildly in areas where there is no data to constrain the solution. The numerical solutions proposed in Section 1.2 do not guarantee a minimum norm solution.

The implicit specification of smoothing is undesirable since it shifts the difficultly to optimal estimation of the number and placement of knots. Instead, we use an explicit smoothing model in the optimization problem. The original cost function minimized the mean square error between the model and the data. We add to this function a roughness penalty that will favor smoother solutions. We formulate the roughness penalty as the integrated square of the second derivative of the slope and offset functions given as

$\begin{matrix} {\left. {\lambda_{s}\int} \middle| {\frac{d^{2}}{{dy}^{2}}{s_{v}(y)}} \middle| {}_{2}{{dy} + {\lambda_{c}\int}} \middle| {\frac{d^{2}}{{dy}^{2}}{c_{v}(y)}} \middle| {}_{2}{dy} \right.,{v = \left\{ {x,y} \right\}}} & (31) \end{matrix}$

where λ_(s) and λ_(c) denote the weights assigned to the roughness terms for the slope and offset respectively. Using Eqs. (4) and (5), we obtain

$\begin{matrix} {{{\frac{d^{2}}{{dy}^{2}}{s_{v}(y)}} = {\sum\limits_{j = 1}^{n_{s}}\;{s_{vj}\frac{d^{2}}{{dy}^{2}}{B_{j,k}^{s}(y)}}}},} & (32) \\ {{\frac{d^{2}}{{dy}^{2}}{c_{v}(y)}} = {\sum\limits_{j = 1}^{n_{c}}\;{c_{vj}\frac{d^{2}}{{dy}^{2}}{{B_{j,k}^{c}(y)}.}}}} & (33) \end{matrix}$ To make the computation tractable, we approximate the integral in Eq. (31) as a sum by evaluating the second derivatives only at the knot locations. For a cubic spline (k=4) with uniformly distributed knot locations at {t₁ . . . , t_(n)}, we have the following approximate relationship if we ignore edge effects

$\begin{matrix} {{\frac{d^{2}}{{dy}^{2}}{B_{j,k}(y)}} \propto \left\{ \begin{matrix} {- 2} & {y = t_{j}} \\ 1 & {{y = \left\{ {t_{j - 1},t_{j + 1}} \right\}},} \\ 0 & {otherwise} \end{matrix} \right.} & (34) \end{matrix}$ Substituting Eq. (34) in Eqs. (32) and (33), we obtain

$\begin{matrix} {{{\frac{d^{2}}{{dy}^{2}}{s_{v}\left( t_{j} \right)}} \propto {{{- 2}s_{vj}} + s_{{vj} - 1} + s_{{vj} + 1}}},} & (35) \\ {{\frac{d^{2}}{{dy}^{2}}{c_{v}\left( t_{j} \right)}} \propto {{{- 2}c_{vj}} + c_{{vj} - 1} + {c_{{vj} + 1}.}}} & (36) \end{matrix}$ Approximating the integral in Eq. (31) with a discrete sum evaluated at the knots locations given by Eqs. (35) and (36), we obtain

$\begin{matrix} {\int\left| {\frac{d^{2}}{{dy}^{2}}{s_{v}(y)}} \middle| {}_{2}{{dy} \propto {\sum\limits_{j = 2}^{n_{s} - 1}{\quad\;\left| {{2s_{vj}} - s_{{vj} - 1} - s_{{vj} + 1}} \middle| {}_{2}{+ \left| {s_{v\; 1} - s_{v\; 2}} \middle| {}_{2}{+ \left| {s_{{vn}_{s}} - s_{{vn}_{s} - 1}} \right|^{2}} \right.} \right.}}} \right.} & (37) \\ {\int\left| {\frac{d^{2}}{{dy}^{2}}{c_{v}(y)}} \middle| {}_{2}{{dy} \propto {\sum\limits_{j = 2}^{n_{c} - 1}\;{\quad\left| {{2c_{vj}} - c_{{vj} - 1} - c_{{vj} + 1}} \middle| {}_{2}{+ \left| {c_{v\; 1} - c_{v\; 2}} \middle| {}_{2}{+ \left| {c_{{vn}_{c}} - c_{{vn}_{c} - 1}} \middle| {}_{2}. \right.} \right.} \right.}}} \right.} & (38) \end{matrix}$ Note that the expressions for the integrated squared second derivative in eqs. (37) and (38) are only approximate but they are reasonably accurate for our purposes. To enable us to write the roughness penalty in vector-matrix notation, we define

$\begin{matrix} {{D_{n} = {\underset{\begin{matrix} ︸ \\ {ncolumns} \end{matrix}}{\left. \begin{bmatrix} 1 & {- 1} & \; & \; & \; \\ {- 1} & 2 & {- 1} & \; & \; \\ \; & \ddots & \ddots & \ddots & \; \\ \; & \; & {- 1} & 2 & {- 1} \\ \; & \; & \; & {- 1} & 1 \end{bmatrix} \right\}}{nrows}}},} & (39) \end{matrix}$ to be a matrix that operates on a vector of length n and computes the second order finite difference. Note that D_(n) is approximately a Toeplitz matrix with the first and last rows adjusted to sum to zero. Using D_(n), we construct a (n_(s)+n_(c))×n_(p) matrix

$\begin{matrix} {D = \begin{bmatrix} {\sqrt{\lambda_{s}}D_{n_{s}}} & 0_{n_{s} \times n_{c}} & 0_{n_{s} \times n_{o}} \\ 0_{n_{c} \times n_{s}} & {\sqrt{\lambda_{c}}D_{n_{c}}} & 0_{n_{c} \times n_{o}} \end{bmatrix}} & (40) \end{matrix}$ that can directly operate on the model parameter vector p_(*x) or p_(*y) to obtain the second order finite differences of the slope and offset spline coefficients. The notation 0_(n) _(s) _(×n) _(o) denotes a n_(s)×n_(o) matrix of all zeros. The overall cost function to be minimized for the x and y model is given as

$\begin{matrix} \begin{matrix} {C_{v} = {\underset{\underset{dataterm}{︸}}{\left( {{Ap}_{*_{v}} - d_{*_{v}}} \right)^{t}\left( {{Ap}_{*_{v}} - d_{*_{v}}} \right)} + \underset{\underset{smoothnessterm}{︸}}{\left( {Dp}_{*_{v}} \right)^{t}\left( {Dp}_{*_{v}} \right)}}} \\ {{= {\left( {{\overset{\_}{A}p_{*_{v}}} - {\overset{\_}{d}}_{*_{v}}} \right)^{t}\left( {{\overset{\_}{A}p_{*_{v}}} - {\overset{\_}{d}}_{*_{v}}} \right)}},{v = \left\{ {x,y} \right\}},} \end{matrix} & (41) \end{matrix}$ where

$\begin{matrix} {\overset{\_}{A} = \begin{bmatrix} A \\ D \end{bmatrix}} & (42) \end{matrix}$ denotes the model matrix augmented by the second derivative matrix and

$\begin{matrix} {\overset{\_}{d} = \begin{bmatrix} d \\ 0_{{({n_{s} + n_{c}})} \times 2} \end{bmatrix}} & (43) \end{matrix}$ denotes the data augmented by zeros (constraints for second derivative.) Minimizing the cost function C_(v) with respect to the parameters yields

$\begin{matrix} {\hat{p} = {\left( {{\overset{\_}{A}}^{t}\overset{\_}{A}} \right)^{- 1}{\overset{\_}{A}}^{t}\overset{\_}{d}}} & (44) \\ {\mspace{14mu}{= {\left( {{\overset{\_}{A}}^{t}\overset{\_}{A}} \right)^{- 1}A^{t}d}}} & (45) \end{matrix}$

If weights (inverse noise covariance) are available for the data points, the cost function is modified as C _(v)=(Ap _(*v) −d _(*v))^(t) W _(v)(Ap _(*v) −d _(*v))+(Dp _(*v))^(t)(Dp _(*v)),v={x,y}  (46) The optimal solution is given as {circumflex over (p)} _(*v)=(Ā ^(t) W _(v) Ā)⁻¹ Ā ^(t) W _(v) d _(v) ,v={x,y}  (47) where W _(v) denotes the weight matrix augmented by the weights for the smoothing constraints

$\begin{matrix} {{{\overset{\_}{W}}_{v} = \begin{bmatrix} W_{v} & 0 \\ 0 & I_{n_{s} + n_{c}} \end{bmatrix}},{v = {\left\{ {x,y} \right\}.}}} & (48) \end{matrix}$ Here I_(n) denotes an identity matrix of size n×n.

To obtain a robust solution, the prescription given in Sec. 1.2.6 can be followed using the augmented model matrix Ā and data vector d with the user supplied weights folded in. Note that this will treat the additional n_(s)+n_(c) constraints given by matrix D as data points and obtain robust weights for them depending on how well the constraints can be satisfied. If this is not desirable, then a procedure similar to that outlined in Sec. 1.2.6 can be formulated that obtains robust weights for only the data points and then computes the model parameters using the robust weights and the smoothing model.

1.4 Uncertainty Prediction of Model Output

In this Section, the subscript v denoting x or y will be dropped for notational simplicity as the equations are same for both x and y. E[·] denotes the expectation operator and bold case is used to specify the random variables over which the expectation is computed and for denoting matrices. Since the expectation in this Section is always computed on vectors (denoted with {right arrow over ( )}), there should be no confusion between the two. Estimated quantities will be denoted by a ^ and matrices and vectors augmented by the smoothing constraints denoted by a .

We are interested in computing the expected mean square error (MSE) between the warp model predictions {circumflex over ({right arrow over (d)})} and the ground truth warp {right arrow over (d)}_(t) as a measure of accuracy of the model output. In practice, the ground truth warp is not available but it will not be required if the model predictions are unbiased as shown below

$\begin{matrix} \begin{matrix} {{{MSE}\left( \hat{\overset{\rightarrow}{d}} \right)} = {E\left\lbrack {\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)^{t}} \right\rbrack}} \\ {= {{E\left\lbrack {\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)^{t}} \right\rbrack} + \underset{\underset{= {0{({bias})}}}{︸}}{\left( {{E\left\lbrack \hat{\overset{\rightarrow}{d}} \right\rbrack} - {\overset{\rightarrow}{d}}_{t}} \right)\left( {{E\left\lbrack \hat{\overset{\rightarrow}{d}} \right\rbrack} - \overset{\rightarrow}{d}} \right)^{t}}}} \\ {{= \Sigma_{\hat{d}}},} \end{matrix} & (49) \end{matrix}$ where Σ_({circumflex over (d)}) denotes the covariance matrix of the model predictions.

Assuming the warp formulated in Sec. 1 models physical reality closely, the model predictions based on image correlation data will be unbiased making Eq. (49) valid. In practice, the bias term will be small and the covariance of the model output will accurately predict the MSE.

We will derive the expression for Σ_({circumflex over (d)}) for the most general case including the smoothing model and robust fitting. Reducing to the special cases of no smoothing model with and without robust fitting is then trivial. Let W denote the weighting matrix for the data points that is obtained by multiplying the specified weights (if any), which are ideally set to the inverse covariance matrix of the input data for optimality, with the weights obtained during robust fitting (if enabled). If the input data has no specified weights and robust fitting is not performed, this will be the identity matrix. As before, W denotes the augmented matrix with the smoothing weights included if the smoothing model is employed. Then the model prediction can be obtained using Eqs. (10) and (47) {circumflex over ({right arrow over (d)})}=A{circumflex over ({right arrow over (p)})}=A(Ā ^(t) WĀ)⁻¹ Ā ^(t) W {right arrow over ( d )}=F{right arrow over (d)}  (50) where the filtering matrix F=A(Ā ^(t) WĀ)⁻¹ Ā ^(t) W   (51) maps the input warp data to the output model prediction. Using Eq. (50), the model prediction covariance can be computed as

$\begin{matrix} \begin{matrix} {E_{\hat{d}} = {E\left\lbrack {\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)\left( {\hat{\overset{\rightarrow}{d}} - {\overset{\rightarrow}{d}}_{t}} \right)^{t}} \right\rbrack}} \\ {= {E\left\lbrack {\left( {{F\overset{\rightarrow}{\overset{\_}{d}}} - {{FE}\left\lbrack \overset{\rightarrow}{\overset{\_}{d}} \right\rbrack}} \right)\left( {{F\overset{\rightarrow}{\overset{\_}{d}}} - {{FE}\left\lbrack \overset{\rightarrow}{\overset{\_}{d}} \right\rbrack}} \right)^{t}} \right\rbrack}} \\ {{= {F\;\Sigma_{\overset{\_}{d}}F^{t}}},} \end{matrix} & (52) \end{matrix}$ where E _(d) is the covariance of the input augmented data. If we trust the input data covariance supplied by the user (no robust fitting performed), Eq. (52) gives us the covariance of the prediction for the most general case.

However, in practice, the input data covariance estimated from image correlation process may be inaccurate for some number of points. The robust fitting procedure of Sec. 1.2.5 is designed precisely to identity such outlier data points. The robust fitting weights can be regarded as corrections to the weights derived from the supplied input data covariance matrix. In essence, the overall weight matrix W, which includes the robust fitting weights, is a refined estimate of the inverse covariance of the input data up to a scale factor. The absolute scale of the input covariance cannot be estimated as part of the robust fitting procedure since only the relative weights between the data points matter in the fitting procedure. In practice, the data weights are usually normalized to unity mean to make them unitless. This makes it easier to specify a fixed amount of smoothing through the smoothing weights independent of the scale of the input data. The estimated input data covariance in terms of the weights is given as {circumflex over (Σ)} _(d) =σ_(d) ² W ⁻¹  (53) where σ_(d) ² is an unknown scale factor that needs to be estimated. Substituting Eqs. (53) and (51) in (52), we obtain

$\begin{matrix} \begin{matrix} {\Sigma_{\hat{d}} = {{A\left( {{\overset{\_}{A}}^{t}\overset{\_}{W}\overset{\_}{A}} \right)}^{- 1}{\overset{\_}{A}}^{t}\overset{\_}{W}\sigma_{d}^{2}{\overset{\_}{W}}^{- 1}\overset{\_}{W}{\overset{\_}{A}\left( {{\overset{\_}{A}}^{t}\overset{\_}{W}\overset{\_}{A}} \right)}^{- 1}A^{t}}} \\ {= {\sigma_{d}^{2}{A\left( {{\overset{\_}{A}}^{t}\overset{\_}{W}\overset{\_}{A}} \right)}^{- 1}{A^{t}.}}} \end{matrix} & (54) \end{matrix}$ 1.4.1 Estimation of σ_(d) ²

The residual of the augmented data is given as {right arrow over ( d )}−{circumflex over ({right arrow over ( d )})}=(I−F ){right arrow over ( d )}  (55) where F=Ā(Ā ^(t) WĀ)⁻¹ Ā ^(t) W   (56) denotes the augmented filtering matrix on the same lines as Eq. (51). Given the linear relationship of the residual to the data and assuming Eq. (53) holds, the residual covariance can be simplified as

$\begin{matrix} \begin{matrix} {{E\left\lbrack {\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)^{t}} \right\rbrack} = {\left( {I - \overset{\_}{F}} \right){\Sigma_{\overset{\_}{d}}\left( {I - \overset{\_}{F}} \right)}^{t}}} \\ {= {\Sigma_{\overset{\_}{d}} - {\overset{\_}{F}}_{\Sigma_{\overset{\_}{d}}} - {\Sigma_{\overset{\_}{d}}{\overset{\_}{F}}^{t}} + {\overset{\_}{F}\Sigma_{\overset{\_}{d}}{\overset{\_}{F}}^{t}}}} \\ {= {\Sigma_{\overset{\_}{d}}\left( {I - \overset{\_}{F}} \right)}^{t}} \\ {{= {{\sigma_{d}^{2}\left( {I - \overset{\_}{F}} \right)}{\overset{\_}{W}}^{- 1}}},} \end{matrix} & (57) \end{matrix}$ where the third line uses the identity FΣ _(d) =FΣ _(d) F ^(t). Rearranging the terms in Eq. (57) gives

$\begin{matrix} {{\sigma_{d}^{2}I} = {{\overset{\_}{W}\left( {I - \overset{\_}{F}} \right)}^{- 1}{{E\left\lbrack {\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)^{t}} \right\rbrack}.}}} & (58) \end{matrix}$ Taking the trace of both sides and approximating the expectation with a single instance of the data (replacing the expectation with a single instance is a very crude approximation, but we make up for that by the trace operation that averages all the data points), we obtain

$\begin{matrix} \begin{matrix} {{\hat{\sigma}}_{d}^{2} = {\frac{1}{m + n_{a}}{{Tr}\left\lbrack {{\overset{\_}{W}\left( {I - \overset{\_}{F}} \right)}^{- 1}\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)^{t}} \right\rbrack}}} \\ {{= {\frac{1}{m + n_{a}}\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)^{t}{\overset{\_}{W}\left( {I - \overset{\_}{F}} \right)}^{- 1}\left( {\overset{\rightarrow}{\overset{\_}{d}} - \hat{\overset{\rightarrow}{\overset{\_}{d}}}} \right)}},} \end{matrix} & (59) \end{matrix}$ where n_(a) is the number of smoothing constrains that augment the data. This can be looked at as the ML estimate of σ_(d) ² with residuals being modeled as zero-mean Gaussian distributed with covariance σ_(d) ²(I−F)W ⁻¹. Since I−F is a full matrix, the residuals are very strongly correlated (at least locally) and nowhere close to independent. However, in practice, approximating the residuals to be independent for the purpose of estimating σ_(d) ² does not result in a significant degradation and has the advantage of saving us the computation of inverting I−F, which is a large (m+n_(a))×(m+n_(a)) matrix. There are numerical issues involved in the inversion of I−F as well since it does not have full rank. A pseudo inverse can be employed but in practice it is better to approximate it as a diagonal matrix and just invert the diagonal entries for the purpose of estimating scale. Approximating independence and restricting the trace to the data portion yields

$\begin{matrix} {{\hat{\sigma}}_{d}^{2} \approx {\frac{1}{m}{\sum\limits_{i = 1}^{m}\;{\frac{W_{ii}}{1 - F_{ii}}{\left( {d_{i} - {\hat{d}}_{i}} \right)^{2}.}}}}} & (60) \end{matrix}$ Note that Eqs. (54) and (60) taken together make the estimated covariance {circumflex over (Σ)}_({circumflex over (d)}) of the output prediction invariant to the scale of weights specified for the input data as well as the weights estimated during robust fitting. This a desirable property as there may be errors in the estimation of the scale of the input data covariance due to modeling errors. 1.4.2 Unbiased Estimate of σ_(d) ²

The noise estimate given by Eq. (60) can potentially be biased lower when robust fitting is performed. Large outliers are suppressed by dialing down the robust weights but in the process other sample points that fall in the true distribution are also de-weighted significantly if they are far away from the mean of the distribution. The exact expression for the bias is difficult to compute analytically when robust fitting is performed without making simplifying assumptions that are typically not valid in practice. Instead, we will use our intuition to come up with a correction factor that accounts for this bias and verify the resulting expression via Monte Carlo simulations.

When computing statistics from N independent samples, we know that the variance of the estimated statistic goes down as 1/N, i.e., each sample has an effective contribution of 1. In robust fitting, we have a weight assigned to each sample i at iteration l, w_(i) ^((l)), where 0≤w_(i) ^((l))≤1. We can look at this weight as the effective contribution of sample i to reducing the variance of the estimated quantity. As the sample moves further away from the mean of the distribution, it contributes less and less to the result increasing the variance of the final estimate. The ratio of effective number of data samples and the actual number of data samples can then serve as a bias correction factor for our estimated noise. Towards this end, we define

$\begin{matrix} {{K = {\frac{1}{m}{\sum\limits_{i = 1}^{m}\; w_{i}^{{(*})}}}},} & (61) \end{matrix}$ where w_(i) ^((*)) denotes the robust weight assigned to sample i at convergence.

Also, note that an estimate of the input data noise similar to that given by Eq. (60) was obtained as part of robust fitting. See Sec. 1.2.6 and the robust estimate of noise given by Eq. (28). We can utilize this estimate along with the heuristic correction factor K to obtain an unbiased estimate

$\begin{matrix} {{{\hat{\sigma}}_{d}^{2} = {\frac{1}{K}{\hat{\sigma}}^{2{{(*})}}}},} & (62) \end{matrix}$ where {circumflex over (σ)}^((*)) is the estimated noise in robust fitting at convergence. Alternatively, we can obtain an unbiased estimate from Eq. (60) and our heuristic correction factor K as follows (Note we use K² rather than K to normalize here since the residuals are multiplied by the robust weights in the numerator)

$\begin{matrix} {{{\hat{\sigma}}_{d}^{2} = {\frac{1}{K_{2}\left( {m + n_{a} - n_{p}} \right)}{\sum\limits_{i = 1}^{m}\;{\frac{W_{ii}}{1 - F_{ii}}\left( {d_{i} - {\hat{d}}_{i}} \right)^{2}}}}},} & (63) \end{matrix}$ where we have accounted for the number of degrees of freedom in the model (n_(p)) and the total number of constraints (m data points and n_(a) smoothing constraints). Typically, the number of smoothing constraints is equal to the number of parameters so the normalization is close to m as in Eq. (60). In practice, the two estimates given by Eqs. (62) and (63) are very close to each other. In case no robust fitting is performed, the unbiased noise estimate is obtained by Eq. (63) with K=1. 2 Image Registration Algorithm

FIG. 3 shows an exemplary data flow of the overall registration method 10. The image correlation process 18 done block-wise fashion across the image provides the warp estimates between the two bands, e.g. reference image 12 and warped image 14. These data estimates are noisy due to imperfect correlation and may be grossly off when the image content does not correlate well across the two spectral bands. The image correlation process may optionally also estimate the uncertainty associated with the estimated warp data. Those skilled in the art will readily appreciate that that any suitable method for estimating the warp and the associated uncertainty from image data can be used. These noisy warp data 20 estimates along with the uncertainty information 21 is fed to the separable spline warp model 16 to optimally filter the data (as explained above in Sec. 1) and correct the errors made in the image correlation process 18. Details of this estimation process are provided in FIGS. 4 and 5 as described below. The spline warp model produces reduced noise or de-noised warp estimates 22 and the associated uncertainty 23 with the data. These estimates may optionally be combined 25 with an independent warp estimate obtained from platform motion data to produce refined warp 27. Details of how the motion data is incorporated into the warp estimate is provided in FIG. 6, as described below. The refined warp data 27 is then fed to an image resampling module 24 that resamples the warped band/image 14 to produce the final registered image 26.

The internal data flow within the spline warp model 16 is shown in FIG. 4. The warp model is first fitted 126 to the data 20 to obtain the unknown model parameters 28. If no uncertainty estimates are available for the warp data, the model fitting is performed using regular least squares as in eq. (13). If uncertainty estimates 21 are available, then either eqs. (17) or (19) are employed. If the input uncertainty estimates 21 are unreliable and the warp data has outliers, a robust fitting procedure is employed and eq. (23) is used after the robust fitting procedure has converged. If the smoothing model is employed for the spline coefficients then fitting is done using eq. (45) when the input uncertainty information is not available and eq. (47) when either the input uncertainty is available or reconstructed using robust fitting. The warp model is then evaluated 30 at the grid locations required by the image resampling routine to dewarp the warped image 14 and register it to the reference image 12 (shown in FIG. 3). The estimated parameters 28 are used in eqs. (4), (5) and (6) to compute the slope, offset, and optical distortion profiles respectively and then eq. (3) is used to compute the de-noised warp 22 at the desired grid locations. The uncertainty of the de-noised warp 23 is predicted 32 using the fitted model parameters 28, the noisy warp data 20 and its associated uncertainty 21.

FIG. 5 shows the details for the uncertainty estimation process 32 for predicting the uncertainty of the de-noised warp 22. If robust fitting is performed, first the warp data uncertainties 21 are corrected 50 using the robust fitting weights 36 that identify the outliers to produce the corrected relative uncertainties 38. The robust fitting weights are given by eq. (30). This correction process is skipped if no robust fitting is performed. The fitted model parameters are used to obtain the filtering matrix in eq. (51), which is then applied to the corrected relative uncertainty 38 using eq. (52) by process 52 to obtain the de-noised warp relative uncertainty 40. Finally, the noise on the input warp data is estimated 42 by comparing the de-noised warp data 48, e.g., which can be the same as the de-noised warp 22 of FIG. 4, to the input noisy warp data 20 with the knowledge of the fitted parameters 49, e.g., which can be the same as the parameters 28 of FIG. 4, and the corrected relative uncertainty 38. In particular, the noise variance is estimated using eq. (60) when no robust fitting is performed and eqs. (62) or (63) is used when robust fitting is performed. This noise estimate is multiplied 54 with the de-noised warp relative uncertainty 40 to obtain the final de-noised warp uncertainty 23.

FIG. 6 shows an exemplary embodiment where the de-noised warp uncertainty 23 of the predicted de-noised warp 22 can be utilized to further refine the warp if there is an alternate method of computing the warp. This refined warp 27 can be used instead of the de-noised warp 22 to perform the image resampling and register the warped image 14 with respect to the reference image 12. If motion data 56 is available for the platform, e.g., an aircraft/space platform, an estimate of the warp 60 between the reference and warp image can be obtained by utilizing a sensor model 58. Such a method of using a sensor model to produce a warp is understood by those skilled in the art. The knowledge of the noise in the measurements of the platform motion can also be utilized to obtain the uncertainty 62 of the warp computed from the motion data 56. It has been explained above how the de-noised warp data and its associated uncertainty can be obtained from image data. These two independent estimates of the warps and their associated uncertainty can be optimally combined 29 to produce a final refined warp 27. Let {circumflex over ({right arrow over (g)})} denote the warp estimate obtained from motion data and let Σ_(ĝ) denote the associated uncertainty of this data. Then the optimal estimate of the overall warp {circumflex over ({right arrow over (h)})} (refined warp) is obtained as {circumflex over ({right arrow over (h)})}=(Σ_({circumflex over (d)}) ⁻¹+Σ_(ĝ) ⁻¹)⁻¹(Σ_({circumflex over (d)}) ⁻¹{circumflex over ({right arrow over (d)})}+Σ_(ĝ) ⁻¹{circumflex over ({right arrow over (g)})})  (64) where {circumflex over ({right arrow over (d)})} is the de-noised warp estimate from image data obtained from eq. (50) and E_({circumflex over (d)}) is the associated uncertainty of this data obtained from eq. (54).

Those skilled in the art will readily appreciate that a system for registering images can include a module configured to implement machine readable instructions to perform one or more of the method embodiments described above.

As will be appreciated by those skilled in the art, aspects of the present embodiments may be embodied as a system, method or computer program product. Accordingly, aspects of the present embodiments may take the form of an entirely hardware embodiment, an entirely software embodiment (including firmware, resident software, micro-code, etc.) or an embodiment combining software and hardware aspects that may all generally be referred to herein as a “circuit,” “module” or “system.” Furthermore, aspects of the present disclosure may take the form of a computer program product embodied in one or more computer readable medium(s) having computer readable program code embodied thereon.

Any combination of one or more computer readable medium(s) may be utilized. The computer readable medium may be a computer readable signal medium or a computer readable storage medium. A computer readable storage medium may be, for example, but not limited to, an electronic, magnetic, optical, electromagnetic, infrared, or semiconductor system, apparatus, or device, or any suitable combination of the foregoing. More specific examples (a non-exhaustive list) of the computer readable storage medium would include the following: an electrical connection having one or more wires, a portable computer diskette, a hard disk, a random access memory (RAM), a read-only memory (ROM), an erasable programmable read-only memory (EPROM or Flash memory), an optical fiber, a portable compact disc read-only memory (CD-ROM), an optical storage device, a magnetic storage device, or any suitable combination of the foregoing. In the context of this document, a computer readable storage medium may be any tangible medium that can contain, or store a program for use by or in connection with an instruction execution system, apparatus, or device.

A computer readable signal medium may include a propagated data signal with computer readable program code embodied therein, for example, in baseband or as part of a carrier wave. Such a propagated signal may take any of a variety of forms, including, but not limited to, electro-magnetic, optical, or any suitable combination thereof. A computer readable signal medium may be any computer readable medium that is not a computer readable storage medium and that can communicate, propagate, or transport a program for use by or in connection with an instruction execution system, apparatus, or device.

Program code embodied on a computer readable medium may be transmitted using any appropriate medium, including but not limited to wireless, wireline, optical fiber cable, RF, etc., or any suitable combination of the foregoing.

Computer program code for carrying out operations for aspects of the present disclosure may be written in any combination of one or more programming languages, including an object oriented programming language such as Java, Smalltalk, C++ or the like and conventional procedural programming languages, such as the “C” programming language or similar programming languages. The program code may execute entirely on the user's computer, partly on the user's computer, as a stand-alone software package, partly on the user's computer and partly on a remote computer or entirely on the remote computer or server. In the latter scenario, the remote computer may be connected to the user's computer through any type of network, including a local area network (LAN) or a wide area network (WAN), or the connection may be made to an external computer (for example, through the Internet using an Internet Service Provider).

Aspects of the present disclosure are described above with reference to flowchart illustrations and/or block diagrams of methods, apparatus (systems) and computer program products according to embodiments of the embodiments. It will be understood that each block of the flowchart illustrations and/or block diagrams, and combinations of blocks in the flowchart illustrations and/or block diagrams, can be implemented by computer program instructions. These computer program instructions may be provided to a processor of a general purpose computer, special purpose computer, or other programmable data processing apparatus to produce a machine, such that the instructions, which execute via the processor of the computer or other programmable data processing apparatus, create means for implementing the functions/acts specified in the flowchart and/or block diagram block or blocks.

These computer program instructions may also be stored in a computer readable medium that can direct a computer, other programmable data processing apparatus, or other devices to function in a particular manner, such that the instructions stored in the computer readable medium produce an article of manufacture including instructions which implement the function/act specified in the flowchart and/or block diagram block or blocks.

The computer program instructions may also be loaded onto a computer, other programmable data processing apparatus, or other devices to cause a series of operational steps to be performed on the computer, other programmable apparatus or other devices to produce a computer implemented process such that the instructions which execute on the computer or other programmable apparatus provide processes for implementing the functions/acts specified in a flowchart and/or block diagram block or blocks

The methods and systems of the present disclosure, as described above and shown in the drawings, provide for imagery with superior properties including improved multi-band/multi-spectral image registration. While the apparatus and methods of the subject disclosure have been shown and described with reference to preferred embodiments, those skilled in the art will readily appreciate that changes and/or modifications may be made thereto without departing from the scope of the subject disclosure. 

What is claimed is:
 1. A method for modeling warp for registration of images comprising: receiving input warp data; and performing a fitting process on the input warp data to produce at least one of reduced noise warp data or reduced noise warp uncertainty, wherein the warp for the at least one of reduced noise warp data or reduced noise warp uncertainty is modeled with components including: an offset that varies in time; and a non-linear distortion that does not vary with time; outputting the at least one of reduced noise warp data or reduced noise warp uncertainty; predicting reduced noise warp uncertainty and utilizing the reduced noise warp uncertainty to generate a refined warp by incorporating motion data; and registering a warped band using the refined warp.
 2. A method as recited in claim 1, wherein the warp is modeled with components including a slope that varies in time.
 3. A method as recited in claim 2, wherein performing the fitting process includes modeling the slope and offset as splines.
 4. A method as recited in claim 3, wherein a smoothing model for slope and offset spline coefficients is incorporated in the fitting process.
 5. A method as recited in claim 4, wherein incorporating a smoothing model in the fitting process includes minimizing curvature of the offset and slope splines.
 6. A method as recited in claim 3, wherein the splines have non-uniformly placed knots.
 7. A method as recited in claim 3, wherein the splines are cubic.
 8. A method as recited in claim 2, wherein the fitting process includes modeling the warp as d _(x)(x,y)=s _(x)(y)x+c _(x)(y)+o _(x)(x) d _(y)(x,y)=s _(y)(y)x+c _(y)(y)+o _(y)(x) where d_(x)(⋅,⋅) and d_(y)(⋅,⋅) are x and y components of the warp, respectively, s_(x)(⋅) and c_(x)(⋅) denote slope and offset as a function of time for the x component of the warp, s_(y)(⋅) and c_(y)(⋅) denote the slope and offset as a function of time for the y component of the warp, and o_(x)(⋅) and o_(y)(⋅) denote distortion that is constant in time but varies along a sensor array, and y in the functions denotes time and x in the functions denotes distance along the sensor array.
 9. A method as recited in claim 1, wherein performing the fitting process includes modeling the non-linear distortion as a polynomial.
 10. A method as recited in claim 1, further comprising: receiving input warp data uncertainty for the input warp data, wherein performing the fitting process includes using the input warp data uncertainty in producing the reduced noise warp data.
 11. A method as recited in claim 10, wherein performing the fitting process includes correcting errors in the input warp data uncertainty by performing a robust fitting process on the input warp data.
 12. A method as recited in claim 10, further comprising: predicting reduced noise warp uncertainty using the input warp data uncertainty and utilizing the reduced noise warp uncertainty to generate a refined warp by incorporating motion data; and registering a warped band using the refined warp.
 13. A method as recited in claim 12, further comprising using a sensor model to produce warp data from motion and motion warp uncertainty, wherein the refined warp is produced by combining the reduced noise warp, the reduced noise warp uncertainty, the warp data from motion, and the motion warp uncertainty.
 14. A method as recited in claim 13, wherein combining the reduced noise warp, the reduced noise warp uncertainty, the warp data from motion, and the motion warp uncertainty includes obtaining the refined warp {circumflex over ({right arrow over (h)})} as {circumflex over ({right arrow over (h)})}=(Σ_({circumflex over (d)}) ⁻¹+Σ_(ĝ) ⁻¹)⁻¹(Σ_({circumflex over (d)}) ⁻¹{circumflex over ({right arrow over (d)})}+Σ_(ĝ) ⁻¹ {circumflex over ({right arrow over (g)})}) where {circumflex over ({right arrow over (d)})} is the reduced noise warp, Σ_({circumflex over (d)}) is the reduced noise warp uncertainty, {circumflex over ({right arrow over (g)})} denotes the warp data from motion, and Σ_(ĝ) denotes the motion warp uncertainty.
 15. A method as recited in claim 10, further comprising: using the input warp data uncertainty to produce relative input warp data uncertainty; using the relative input warp data uncertainty in a warp model robust fit to produce fit weights; using the fit weights and relative input warp data uncertainty to produce corrected relative input warp data uncertainty; predicting relative reduced noise warp data uncertainty from the corrected relative input warp data uncertainty; using the reduced noise warp data and the input warp data to estimate input noise variance; and multiplying the relative reduced noise warp data uncertainty and the input noise variance to produce the reduced noise warp data uncertainty.
 16. A system including: a module configured to implement machine readable instructions to: receive input warp data; and perform a fitting process on the input warp data to produce at least one of reduced noise warp data or reduced noise warp uncertainty, wherein the warp for the at least one of reduced noise warp data or reduced noise warp uncertainty is modeled with components including: an offset that varies in time; and a non-linear distortion that does not vary with time; output the at least one of reduced noise warp data or reduced noise warp uncertainty; predict reduced noise warp uncertainty and utilizing the reduced noise warp uncertainty to generate a refined warp by incorporating motion data; and register a warped band using the refined warp.
 17. A system as recited in claim 16, wherein the module is configured to implement machine readable instructions wherein performing the fitting process includes modeling the slope and offset as splines.
 18. A system as recited in claim 16, wherein the module is configured to implement machine readable instructions wherein performing the fitting process includes modeling the non-linear distortion as a polynomial.
 19. A system as recited in claim 16, wherein the module is configured to implement machine readable instructions including instructions to: receive a reference image and a warped image; correlate the reference and warped images to produce noisy warp data; perform a fitting process on the noisy warp data to produce reduced noise warp data, wherein the warp for the reduced noise warp data is modeled with components including: an offset that varies in time; and a non-linear distortion that does not vary with time; and resampling the warped image using the reduced noise warp data to produce a registered image that is registered to the reference image.
 20. A method for registration of images comprising: receiving a reference image and a warped image; correlating the reference and warped images to produce noisy warp data; performing a fitting process on the noisy warp data to produce reduced noise warp data, wherein the warp for the reduced noise warp data is modeled with components including: an offset that varies in time; and a non-linear distortion that does not vary with time; resampling the warped image using the reduced noise warp data to produce a registered image that is registered to the reference image receiving motion data; and combining the motion data with the reduced noise warp and the reduced noise warp uncertainty to produce a refined warp, wherein resampling the warped image using the reduced noise warp data includes using the refined warp to produce the registered image.
 21. A method as recited in claim 20, wherein correlating the reference and warped images includes producing warp data uncertainty.
 22. A method as recited in claim 21, wherein performing the fitting process includes generating reduced noise warp uncertainty. 